Back

Journal of Molecular Evolution

Springer Science and Business Media LLC

Preprints posted in the last 30 days, ranked by how well they match Journal of Molecular Evolution's content profile, based on 21 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Short Interrupted Repeats Cassette (SIRC) ensembles of plant genomes reflects evolutionary route

Gorbenko, I. V.; Scherbakov, D. Y.; Zverintseva, K. M.; Konstantinov, Y. M.

2026-03-30 plant biology 10.64898/2026.03.27.714674 medRxiv
Top 0.1%
6.4%
Show abstract

Short Interrupted Repeats Cassettes (SIRC) are recently discovered eukaryotic DNA elements possessing many traits of satellite DNA and mobile genetic elements, and consisted of short direct repeats interspersed with diverse spacer sequences. The SIRC ensemble of individual species is highly heterogenous and cannot be studied using alignment methods. It was found that number of similar SIRC sequences in a given pair of species is in general correlated with their taxonomic distance, and, at the same time, closely related species can possess very diverged SIRC ensembles, which makes SIRC evolutionary pattern closer to mobile genetic element type. The SIRC sequences make up clusters with comparable sequence patterns, that are likely to demonstrate doublet evolutionary model which strongly supports that the SIRC structure is supported by the evolutionary selection. Several SIRC sequences of Arabidopsis were found to be of ancient origin with traceable evolution history as far as to the moss clade. We carried out unbiased detection of SIRC ensembles in 10 plant genomes and found that, despite very high intraspecies heterogeneity, SIRC sets possess strong interspecies phylogenetic signal. Key messageShort Interrupted Repeats Cassettes are elements of ancient origin, and could potentially be used to trace organism history, and to facilitate syntheny and Hi-C analysis.

2
A New Information Theoretic Approach Shows that Mixture Models Outperform Partitioned Models for Phylogenetic Analyses of Amino Acid Data

Ren, H.; Jiang, C.; Wong, T. K. F.; Shao, Y.; Susko, E.; Minh, B. Q.; Lanfear, R.

2026-03-18 evolutionary biology 10.64898/2026.03.16.712229 medRxiv
Top 0.1%
1.9%
Show abstract

Partitioned and mixture models are widely employed in Maximum Likelihood phylogenetic analyses of large genomic datasets. Comparing the fit of the two types of models has been challenging, because standard information-theoretic approaches cannot be applied. Mixture models are increasingly popular for the analysis of amino acid datasets and can lead to different conclusions compared to partitioned models. This raises an important question - which type of model tends to perform better? Susko et al. (2026) recently introduced the marginal Akaike information criterion (mAIC), which allows mixture models and partitioned models to be directly compared for the first time. Here, we use the mAIC and a range of other approaches to compare the fit of mixture and partitioned models across a diverse set of empirical datasets. We show that mixture models are universally favoured on amino acid datasets. This has important implications for interpreting empirical analyses and suggests that continued development of mixture models is an important avenue for future research.

3
Contrasting Species-Level and Genus Level Disparity Patterns within the ammonoid family Acanthoceratidae

Howard, L.; Wagner, P. J.

2026-03-23 paleontology 10.64898/2026.03.20.713222 medRxiv
Top 0.2%
1.3%
Show abstract

Paleobiologists commonly use genera as a proxy for species in biodiversity studies. However, a lingering concern is that patterns among genera might not always faithfully reflect patterns among species. To date, the concern has focused chiefly on measured patterns of richness over time and on implied origination and extinction rates. However, similar issues might arise for studies of morphological disparity. Moreover, there potentially are additional implications of disparity patterns among species versus those among genera concerning the range of observable anatomical characters and whether disparity within genera is comparable to disparity among genera. If clades have some relatively slowly changing characters that workers have used to denote different genera, then we would expect to see congeneric species to cluster in morphospace; however, if such characters are rare, then within-genus disparity might approach among-genus disparity. Here, we use genus-level and species-level disparity patterns among acanthoceratid ammonoids from the Late Cretaceous. In particular, we examine whether these different level imply different evolutionary dynamics over a major ecological event (Ocean Anoxic Event 2) and how disparity within genera (i.e., among congeneric species) compares to disparity among genera. We find genus-level disparity somewhat inflates early acanthoceratid disparity but implies similar patterns over the OAE2. We also find that within-genus disparity is slightly lower than among-genus, but not hugely so. The combined results suggest that acanthoceratoid shell anatomy does not really show "genus" level characters, even if congeneric species do tend to be more similar to each other than to species in other genera. Thus, this might provide more of a warning for other types of studies using anatomical data (e.g., phylogenetic studies) than for disparity studies. Non-technical SummaryMany paleobiologists use genera to examine scientific questions. This leads to questions over whether this broader approach misses important species-level patterns. This study uses acanthoceratid ammonoids from the Late Cretaceous to examine disparity patterns at both the genus-level and the species-level. We specifically examine the disparity at both levels of this group over a time of high stress for this group, Ocean Anoxic Event 2 (OAE2). Our results show that genus-level disparity slightly exaggerates early acanthoceratid disparity but lowers to a similar pattern to the species-level disparity during OAE2. Within-genus disparity is shown to be slightly lower than among-genus, but not enough to be startling. Together, these results indicate that while some species within the same genus tend to be more alike to each other than those in other genera, there isnt a set of true "genus" level characters. This outcome leads to a warning against using anatomical data in phylogenetic studies, but less so for disparity studies.

4
An abstract model of nonrandom, non-Lamarckian mutation in evolution using a multivariate estimation-of-distribution algorithm

Vasylenko, L.; Livnat, A.

2026-04-01 evolutionary biology 10.64898/2026.03.30.715341 medRxiv
Top 0.4%
0.7%
Show abstract

At the fundamental conceptual level, two alternatives have traditionally been considered for how mutations arise and how evolution happens: 1) random mutation and natural selection, and 2) Lamarckism. Recently, the theory of Interaction-based Evolution (IBE) has been proposed, according to which mutations are neither random nor Lamarckian, but are influenced by information accumulating internally in the genome over generations. Based on the estimation-of-distribution algorithms framework, we present a simulation model that demonstrates nonrandom, non-Lamarckian mutation concretely while capturing indirectly several aspects of IBE: selection, recombination, and nonrandom, non-Lamarckian mutation interact in a complementary fashion; evolution is driven by the interaction of parsimony and fit; and random bits do not directly encode improvement but enable generalization by the manner in which they connect with the rest of the evolutionary process. Connections are drawn to Darwins observations that changed conditions increase the rate of production of heritable variation; to the causes of bell-shaped distributions of traits and how these distributions respond to selection; and to computational learning theory, where analogizing evolution to learning in accord with IBE casts individuals as examples and places the learned hypothesis at the population level. The model highlights the importance of incorporating internal integration of information through heritable change in both evolutionary theory and evolutionary computation.

5
RNA-ligand complexes and the attenuation of neutral confinement in the evolution of RNA secondary structures

Loreto, A.; Ugalde, E.; Espinosa-Soto, C.

2026-03-29 evolutionary biology 10.64898/2025.12.19.695547 medRxiv
Top 0.4%
0.7%
Show abstract

RNA molecules with identical nucleotide sequence can adopt different structures. Mutations can alter their properties; for example, some mutations increase the stability of a functionally relevant structure at the expense of other structures' stability. Interestingly, the structural diversity that a sequence produces is correlated to the number of structures that it can access upon mutation. Thus, enhancing a structure's stability can lead to neutral confinement, an evolutionary dead-end in which mutational access to novel structures is increasingly difficult. If structure is critical to biological function, how do RNA molecules escape neutral confinement? We developed a model in which an RNA molecule's function depends on binding to a ligand and we applied it to study sequences that fold according to RNA biophysics, also simulating their evolution. Our analyses and simulations identified effects that decrease the selective advantage of augmenting a structure's stability. By disfavouring evolution of highly stable structures and favouring the accumulation of genetic variation, these effects hinder neutral confinement. The most important effect stems from the sequestration of high affinity structures in RNA-ligand complexes and their replenishment through thermal fluctuations. In this perspective, a common scenario may help to explain how RNA evolution avoids coming to a halt.

6
Distribution of genetic paternity in primate groups

Rosenbaum, S.; Grebe, N.; Silk, J. B.

2026-04-03 evolutionary biology 10.64898/2026.04.02.716091 medRxiv
Top 0.5%
0.6%
Show abstract

Understanding the distribution of paternity within social groups is critical for testing hypotheses about the evolution of behavior and morphology in primates, but assembling the requisite comparative data is a challenging task. We compiled genetic paternity data from 52 species of wild nonhuman primates along with information about socioecological, morphological, and life history traits that are relevant to understanding what proportion of offspring are sired by primary males (i.e., alpha males in multi-male groups and resident males in single male groups). Our dataset, which currently contains information about 11 primate families and >3,000 individual paternities, is presented as a publicly accessible, living database designed to be updated as new data become available. Using Bayesian regression models, we investigated the role that phylogeny, group composition, and seasonality play in determining primary males paternity share, and assessed the relative share of paternities obtained by non-primary residents versus extra-group males. First, we found that phylogeny has a detectable but relatively modest influence on primary males paternity share. Species-level differences explained roughly 35-40% of variation in primary males paternity share, and of that interspecific variation, [~]50-70% was attributable to shared phylogenetic history. Second, group composition strongly predicted paternity share outcomes. Primary males in single-male/multi-female groups obtained the highest share of paternity ([~]80%), while those in multi-male groups had the lowest ([~]60%), though there was substantial variation within each category. Pair-living animals showed a striking split: males in cohesive pairs sired [~]90% of offspring, while those in dispersed pairs sired only [~]55%. Contrary to expectations, reproductive seasonality did not predict primary males paternity share in any group type. Finally, when primary males in multi-male groups lost paternities, [~]75% of losses were to other resident males. Overall, [~]5-15% of offspring in these groups were sired by extra-group males. Our results largely confirm earlier findings based on smaller datasets, but also show that the relationship between social organization and paternity is more complicated than simple categorical predictions suggest. We discuss the gap between the data that would ideally be available for testing these hypotheses versus what currently exists, with hopes that our living database can help close this gap over time.

7
Estimating Bayesian phylogenetic information content using geodesic distances

Milkey, A.; Lewis, P. O.

2026-04-01 evolutionary biology 10.64898/2026.03.31.715656 medRxiv
Top 0.5%
0.5%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWA new Bayesian measure of phylogenetic information content is introduced based on geodesic distances in treespace. The measure is based on the relative variance of phylogenetic trees sampled from the posterior distribution compared to the prior distribution. This ratio is expected to equal 1 if there is no information in the data about phylogeny and 0 if there is complete information. Trees can be scaled to have the same mean tree length to avoid dominance by edge length information and focus on topological information. The method scales well, requiring only that a valid sample can be obtained from both prior and posterior distributions. We show how dissonance (information conflict) among data sets can also be estimated. Both simulated and empirical examples are provided to illustrate that the new approach produces sensible and intuitive results.

8
Identification of a microRNA with a mutation in the loop structure in the silkworm Bombyx mori

Harada, M.; Tabara, M.; Kuriyama, K.; Ito, K.; Bono, H.; Sakamoto, T.; Nakano, M.; Fukuhara, T.; Toyoda, A.; Fujiyama, A.; Tabunoki, H.

2026-03-27 molecular biology 10.64898/2026.03.24.714027 medRxiv
Top 0.5%
0.5%
Show abstract

MicroRNAs (miRNAs) play essential roles in the posttranscriptional regulation of gene expression in organisms. In the process of synthesizing mature miRNAs from miRNA precursors, the miRNA precursors are cleaved via Dicer at their loop structure, after which the miRNA precursors become mature and regulate transcription. However, the consequences of altering the loop sequence are not fully understood. The silkworm Bombyx mori is a lepidopteran insect with many genetic strains. We identified a mutant of the miRNA miR-3260 whose the part of the loop structure was lacking in a silkworm strain with translucent larval skin. Here, we aimed to analyze the role of wild-type miR-3260 and the influence of the mutation of the loop structure in B. mori. First, we identified the genomic region responsible for the translucent larval skin phenotype and determined that the mutated miR-3260 nucleotide sequences. Then, we predicted the binding partners of wild-type miR-3260 using the RNA hybrid tool and found two juvenile hormone (JH)-related genes as targets of wild-type miR-3260. Next, we assessed the relationships between miR-3260 and JH and found that miR-3260 was highly expressed in the Corpora allata and its expression responded to JH treatment. Meanwhile, miR-3260 mimic and inhibitor did not induce the typical phenotypes associated with JH in B. mori. Then, we compared the dicing products from wild-type and mutant miR-3260 precursors and observed that neither form underwent Dicer-mediated cleavage when the loop structure was altered. These results suggest that loop mutations in the miR-3260 precursor may not influence dicing activity, consistent with the lack of observable phenotypic effects.

9
A conserved isoleucine gates the diffusion of small ligands to the active site of NiFe CO-dehydrogenase

Opdam, L.; Meneghello, M.; Guendon, C.; Chargelegue, J.; Fasano, A.; Jacq-Bailly, A.; Leger, C.; Fourmond, V.

2026-03-21 biochemistry 10.64898/2026.03.19.713016 medRxiv
Top 0.5%
0.5%
Show abstract

CO dehydrogenases (CODH) are metalloenzymes that reversibly oxidize CO to CO2, at a buried NiFe4S4 active site. The substrates, CO and CO2, need therefore to be transported through the protein matrix to reach the active site. The most likely pathway for intra-protein diffusion is the hydrophobic channel identified in the crystal structures. Here, we use site-directed mutagenesis to study the highly conserved isoleucine 563 of Thermococcus sp. AM4 CODH2. Mutations at this position change the biochemical properties (KM for CO, product inhibition constant, catalytic bias...), and increase the resistance of the enzyme to the inhibitor O2, showing that isoleucine 563 indeed lines the gas channel. The I563F mutation decreases the bimolecular rate constant of inhibition by O2 15-fold, and increases the IC50 20-fold, which is the strongest improvement in O2 resistance reported so far. We show that the size of the introduced amino acids is less important than their flexibility - along with the size of the cavity formed near the active site in the channel. We also conclude that O2 access to the active site cannot be slowed down without also affecting CO diffusion. This tradeoff will have to be considered in further attempts to use site-directed mutagenesis to make CODHs more O2 tolerant.

10
The B. subtilis translesion polymerase Pol Y1 is not strongly recruited to sites of replication upon different types of DNA damage

Martinez-Whitman, S. R.; Santana, C. M.; Campbell, A. P.; Feldman, D. T.; Jabaley, I. E. Z.; O'Neal, L. G.; Marrin, M. E.; Thrall, E. S.

2026-04-03 biochemistry 10.64898/2026.04.02.716108 medRxiv
Top 0.7%
0.4%
Show abstract

One challenge to DNA replication is the presence of unrepaired damage on the template strand, which can stall the replication machinery. This stall can be resolved by the translesion synthesis (TLS) pathway, in which specialized translesion polymerases are recruited to copy damaged DNA. Because TLS polymerases are error-prone, their activity is regulated at multiple levels to minimize unnecessary mutagenesis. Although the molecular mechanisms of bacterial TLS have been extensively studied in Escherichia coli, less is known about this pathway in other species. In E. coli, the TLS polymerase Pol IV is minimally enriched at replication forks in the absence of DNA damage but is strongly recruited upon replication stalling, enabling TLS while minimizing mutagenesis. However, we recently showed that the Bacillus subtilis TLS polymerase Pol Y1, the homolog of Pol IV, is moderately enriched near replication sites even during normal growth and is not further enriched upon treatment with the DNA damaging agent 4-nitroquinoline 1-oxide (4-NQO). It is unknown whether this behavior is unique to 4-NQO or general to other types of DNA damage. In this study, we investigate the effects of four different DNA damaging agents (ultraviolet light, methyl methanesulfonate, nitrofurazone, and mitomycin C) in B. subtilis. We first characterize the contributions of the two TLS polymerases, Pol Y1 and Pol Y2, to DNA damage survival and damage-induced mutagenesis after treatment with these agents. We then use single-molecule fluorescence microscopy to measure the localization and dynamics of individual Pol Y1 molecules in live B. subtilis cells. We find that Pol Y1 and Pol Y2 have differing effects on survival and mutagenesis, but that under no circumstances is Pol Y1 strongly recruited to sites of replication upon DNA damage. This study broadens our understanding of TLS in B. subtilis, indicating that there are notable differences in TLS mechanisms across bacteria.

11
How much information is there for inferring species trees?

Milkey, A.; Chen, J.; Lewis, P. O.

2026-04-02 evolutionary biology 10.64898/2026.04.01.715836 medRxiv
Top 0.7%
0.4%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWAs modern phylogenomics datasets become increasingly large, it is useful to develop recommendations for how to subsample datasets for best species tree inference. Here we apply a new measure of phylogenetic information content that estimates the reduction in tree space occupied by a posterior sample of inferred trees relative to a prior sample in order to assess the effects of gene tree parameters on species tree estimation. We find that, consistent with earlier studies, when data are informative, more data result in better species tree inference. However, when data are uninformative, subsampling a dataset to include only the most informative loci may produce a better species tree sample. We perform analyses on a variety of simulated and empirical datasets.

12
EMS Mutation and SNP Detection in Intracellular Wolbachia Genomes

Penunuri, G. A.; Pepper-Tunick, E. A.; McBroome, J.; Corbett-Detig, R.; Russell, S.

2026-03-31 genomics 10.64898/2026.03.29.714874 medRxiv
Top 0.7%
0.4%
Show abstract

Endosymbiotic bacteria such as Wolbachia pose significant challenges to genetic and molecular investigation due to their obligate intracellular lifestyle and complex growth requirements.Current understanding of their protein biology relies heavily on functional assignments inferred by homology, which may not reflect the specific roles endosymbiont proteins play within the host. This work addresses the need for robust genetic perturbation by demonstrating the successful application and detection of chemical mutagenesis in the genome of the wMel strain of Wolbachia grown within a stably infected Drosophila melanogaster JW18 cell line. To accurately detect EMS-induced mutations in a large, unsorted cell culture population, in which mutations remain at very low allele frequency, we implemented an ultra-low error rate sequencing strategy, circle sequencing. This technique enables confident detection of EMS-induced single nucleotide polymorphisms (SNPs) that would be swamped by the inherent error rates of standard next-generation sequencing. Circle sequencing library preparations successfully revealed a clear EMS mutation signal in treated cells, characterized by a significant enrichment of canonical C/G>T/A transitions. Furthermore we present a model explaining observed EMS mutation rates across the genome for different sequence contexts. These findings show that EMS-treatment can successfully leave detectable mutation signals in intracellular genomes, and offer promise for the future development of protocols to make targeted edits in Wolbachia genomes. ImportanceAs the use of intracellular symbionts for bioengineering projects grows, so does the need for foundational protocols for the genetic manipulation of intracellular genomes. Ethyl methanesulfonate (EMS), a chemical mutagen, has been a research tool for initial genomic analysis of gene function in plant and animal systems for decades and represents an established way of generating mutations for future functional testing.

13
Evolutionary persistence of a highly prevalent multicopy mitochondrial-derived nuclear insertion (Mega-NUMT) in Neotropical Drosophila flies

Montoliu-Nerin, M.; Strunov, A.; Heyworth, E.; Schneider, D. I.; Thoma, J.; Hua-Van, A.; Courret, C.; Klasson, L. J.; Miller, W. J.

2026-04-01 evolutionary biology 10.64898/2026.03.31.715258 medRxiv
Top 0.8%
0.3%
Show abstract

BackgroundAlthough strict maternal transmission of mitochondria is a general feature of animals and humans for ensuring homogeneity in mitochondrial DNA (mtDNA) across generations, exceptions were reported in the recent past. For example, some extremely rare but spectacular cases of heteroplasmy and paternal transmission in humans have questioned the universal evolutionary principle. Hence, as an alternative, the Mega-NUMT concept was coined to explain this discovery and was thereafter partly proven to exist. This concept expands on the quite common transfer of mtDNA fragments to the nucleus (NUMTs) by considering the existence of multicopy mitochondrial nuclear insertions. Mega-NUMT reports are currently restricted to a few cases in animals, including humans. However, even in humans, their detailed genomic organization, natural prevalence, and potential biological functions remain unclear. Methodology/Principal FindingsHere, we discovered that up to 60 full-sized mitochondrial genomes are integrated into the nuclear genome of the neotropical fruit fly Drosophila paulistorum using long-read sequencing and confirmed their presence by in situ hybridization. The copies are organized in one cluster on chromosome 3, which we, due to its similarity with the Mega-NUMT concept, designated the "Dpau Mega-NUMT". Contrary to the rarity in humans, this Mega-NUMT is found at high prevalence (40%) in both long-term laboratory lines and natural D. paulistorum populations of different semispecies. Additionally, the mitochondrial copies in the Mega-NUMT cluster are phylogenetically separated from the current mitotypes of D. paulistorum. Together, these observations suggest long-term maintenance of the Mega-NUMT in nature. Hence, we propose that the Dpau Mega-NUMT may have been transferred to the nuclear genome before D. paulistorum semispecies radiation and maintained at relatively high prevalence in nature by balancing selection due to yet undetermined functions. Conclusions/SignificanceTo our knowledge, this is the first verified existence and detailed dissection of a Mega-NUMT outside cats and humans. We show that Mega-NUMTs can be persistent in nature, even at high prevalence, potentially due to balancing selection. Our findings strengthen the importance of high-quality long-read sequencing technologies for deciphering complex repeat-rich genomic regions to deepen our understanding of the dynamics of genome evolution within genomic "dark matter".

14
Archaeological preservation of amelogenesis pathways

Asmundsdottir, R. D.; Troche, G.; Olsen, J. V.; Martinez de Pinillos, M.; Martinon-Torres, M.; Schrader, S.; Welker, F.

2026-03-26 evolutionary biology 10.64898/2026.03.25.713862 medRxiv
Top 0.8%
0.3%
Show abstract

Dental enamel, the hardest mineralised tissue in the human body, has proven to be an excellent source of ancient proteins, which have been found to survive within dental enamel for at least twenty million years. In archaeological and palaeontological contexts, the enamel proteome is generally considered to be rather small, consisting of about twelve proteins, most of which are unique to enamel. During amelogenesis these proteins undergo in vivo digestion by matrix metalloproteinase 20 (MMP20) and kallikrein 4 (KLK4) as well as serine phosphorylation by family with sequence similarity member 20-C (FAM20C) that alter their characteristics. Gaining knowledge of the previously understudied influence of amelogenesis on the archaeological human dental enamel proteome could benefit various palaeoproteomic analysis, especially in an human evolutionary context. Here we present archaeological dental enamel proteomes and explore protein cleavage patterns and sequence coverage to estimate the effects of in vivo digestion, as well as explore phosphorylation patterns. Additionally, we present a new marker based on phosphorylation to estimate genetic sex.

15
Introgression across ploidies contributes to genetic diversity in introduced urban Capsella bursa-pastoris

Wilson Brown, M. K.; Panko, R.; Platts, A. E.; Josephs, E. B.

2026-03-19 plant biology 10.64898/2026.03.17.712489 medRxiv
Top 0.8%
0.3%
Show abstract

Successful establishment of a species in a new range is a useful way to understand the impact of demography and selection on the evolution of globally distributed species. In particular, introductions influence genetic diversity and population structure in the introduced range in unpredictable ways. Additionally, introgressive hybridization is often associated with successful establishment in new ranges. In this study, we explore the impact of introgressive hybridization on the polyploid Capsella bursa-pastoris in the New York City metropolitan area. We find Capsella bursa-pastoris in the New York City metropolitan area likely originated from multiple introductions from northern Eurasia, and that populations across the New York City metropolitan area are generally panmictic. As with Capsella bursa-pastoris in Eurasia, we discover evidence of introgression from the diploid Capsella rubella in this population. By evaluating ancestry in regions across the genome, we find introgressed regions are rich in gene content and contribute to genetic diversity in this population. These results suggest that introgressive hybridization before introductions may buffer species from the negative effects of population bottlenecks and allow for successful establishment.

16
Ancestral state reconstruction with discrete characters using deep learning

Nagel, A. A.; Landis, M. J.

2026-03-21 evolutionary biology 10.64898/2026.03.19.712918 medRxiv
Top 0.9%
0.3%
Show abstract

Ancestral state reconstruction is a classical problem of broad relevance in phylogenetics. Likelihood-based methods for reconstructing ancestral states under discrete character models, such as Markov models, have proven extremely useful, but only work so long as the assumed model yields a tractable likelihood function. Unfortunately, extending a simple but tractable phylogenetic model to possess new, but biologically realistic, properties often results in an intractable likelihood, preventing its use in standard modeling tasks, including ancestral state reconstruction. The rapid advancement of deep learning offers a potential alternative to likelihood-based inference of ancestral states, particularly for models with intractable likelihoods. In this study, we modify the phylogenetic deep learning software O_SCPLOWPHYDDLEC_SCPLOW to conduct ancestral state reconstruction. We evaluate O_SCPLOWPHYDDLEC_SCPLOWs performance under various methodological and modeling conditions, while comparing to Bayesian inference when possible. For simple models and small trees, its performance resembles the performance of Bayesian inference, but worsens as tree size increases. While O_SCPLOWPHYDDLEC_SCPLOW still performs adequately for more complex models, such as speciation and extinction models, the estimates differ more from Bayesian inference in comparison with simpler models. Lastly, we use O_SCPLOWPHYDDLEC_SCPLOW to infer ancestral states for two empirical datasets, one of the ancestral ranges of a subclade of the genus Liolaemus and ancestral locations for sequences from the 2014 Sierra Leone Ebola virus disease outbreak.

17
Position-dependent variant effects reveal importance of context in genomic regulation

Aninta, S. I.; Tewhey, R.; de Boer, C. G.

2026-03-18 genomics 10.64898/2026.03.17.712488 medRxiv
Top 0.9%
0.3%
Show abstract

Gene expression is governed by the DNA sequence, which is read out through complex interactions between transcription factors (TFs), co-activators, and chromatin. Massively Parallel Reporter Assays (MPRAs) provide a high-throughput framework for functionally characterizing how regulatory DNA sequences impact the expression of a model gene. MPRAs have also proven to be useful for measuring the effects of genetic variation, where each allele is typically tested in the center of [~]200 bp of genomic context cloned into the MPRA; but the impact of variant position and local context remains largely unexplored. In this study, we systematically investigate how shifting the position of a variant within an MPRA probe influences its regulatory activity using models that predict expression in MPRAs from DNA sequence. We find that while the direction of variant effects is usually preserved across positions, the magnitude of expression changes can vary substantially depending on where the variant is placed within the construct. This positional bias appears to be largely explained by the strong position-dependent activity of TFs whose binding the variants perturb. In a subset of cases, interactions consistent with cooperativity between TFs also contributes to position-specific effects. [~]1% of variants appear to disrupt RNA polymerase III (Pol III) promoters within Alu elements, resulting in position-specificity because both A and B boxes are required for function and exclusion of either motif due to window shifts disrupts the variants effects. However, we saw little evidence to support the hypothesis that the positional dependence of variant effects resulted from redundancy of motifs. Overall, our study demonstrates the complexity of cis-regulatory grammar and how it can confound the interpretation of regulatory variants.

18
Structure of the Arabidopsis receptor kinase SRF6 ectodomain determined from crystals obtained using the LRR crystallisation screen

Caregnato, A.; Hohmann, U.; Hothorn, M.

2026-03-23 plant biology 10.64898/2026.03.20.713188 medRxiv
Top 1.0%
0.3%
Show abstract

Plant-specific membrane receptor kinases with structurally diverse extracellular domains regulate key processes in plant growth, development, immunity and symbiosis. Structural studies of these glycoproteins are often hampered by the limited quantities in which they can be obtained. Here, we describe the LRR crystallization screen, which has enabled the successful crystallization and structure determination of multiple receptor kinase ectodomains, including ligand-and co-receptor-bound complexes. As an example, we report the 1.5 [A] resolution crystal structure of the leucine-rich repeat (LRR) domain of STRUBBELIG-RECEPTOR FAMILY 6 (SRF6) from Arabidopsis thaliana. The SRF6 ectodomain contains seven LRRs and a disulfide-bond-stabilised N-terminal capping domain but lacks the canonical C-terminal cap and the N-glycosylation pattern typically observed in other family members. Previously reported protein-protein interactions between the SRF6 and SRF7 ectodomains and the receptor kinases BRI1, BRL1, BRL3, SERK3 and BIR1-3 could not be confirmed by quantitative isothermal titration calorimetry and grating-coupled interferometry assays, suggesting that these structurally conserved LRR receptor kinases may have signalling functions outside the brassinosteroid pathway. SynopsisA crystallisation screen that has enabled the structural analysis of various extracellular domains of plant membrane receptor kinases is described together.

19
Genomes of two arid-zone marsupials uncover contrasting responses to climatic change

Feigin, C. Y.; Trybulec, E.; Ferguson, R.; Scicluna, E. L.; Sauermann, R.; Hartley, G. A.; O'Neill, R. J.; Pask, A. J.

2026-04-02 genomics 10.64898/2026.03.30.708387 medRxiv
Top 1%
0.3%
Show abstract

Small marsupials in the family Dasyuridae are a key component of Australias arid and semi-arid fauna, whose high species richness is proposed to reflect an opportunity-driven adaptive radiation. Despite growing interest in this group from both ecological and evolutionary perspectives, genomic data for most species is non-existent, or limited to a few marker loci. Here, we generated a chromosome-level reference genome and a de novo mitochondrial genome for the desert-dwelling Wongai ningaui (Ningaui ridei). The nuclear genome assembly is highly contiguous, with a scaffold N50 of 594.484 MB and high BUSCO gene recovery (93.84%). Additionally, we produced a draft assembly for the related, semi-arid slender-tailed dunnart (Sminthopsis murina). We then used these assemblies to explore the demographic histories of these species. We find evidence for contrasting patterns of population growth during the late Pleistocene and early Holocene, corresponding with differences in local climate, potentially consistent with differences in optimal habitat. The new genomic resources and demographic findings presented here provide a foundation for future studies on adaptive specialisation in this group of Australian marsupials. Significance StatementDasyurid marsupials are the primary carnivorous and insectivorous mammals in Australia. This diverse family includes species such as the endangered Tasmanian devil (Sarcophilus harrisii) and quolls (Genus Dasyurus), as well as an emerging laboratory model species, the fat-tailed dunnart (Sminthopsis crassicaudata). Despite the species richness within dasyurids, most species remain under-studied. This is particularly true of arid and semi-arid zone species, who are often small in size, live in remote habitats and are cryptic by nature. By creating genome assemblies for two dasyurid species, this study provides resources to support a variety of phylogenetic, population genetic and evolutionary developmental lines of research. Importantly, the studys finding that arid and semi-arid dasyurids show distinct trajectories of demographic change in response to historical climatic shifts may point to local adaptations with implications for the resilience of these species to ongoing and future climate change.

20
k-Nearest Common Leaves algorithm for phylogenetic tree completion

Koshkarov, A.; Tahiri, N.

2026-04-04 evolutionary biology 10.64898/2026.04.02.716144 medRxiv
Top 1%
0.3%
Show abstract

Phylogenetic trees represent the evolutionary histories of taxa and support tasks such as clustering and Tree of Life reconstruction. Many established comparison methods, including the Robinson-Foulds (RF) distance, assume identical taxon sets. A methodological gap remains for trees with distinct but overlapping taxa. Existing approaches either prune non-common leaves, which can discard information, or complete both trees such that they share the same taxa. Completion is more comprehensive, but current methods typically ignore branch lengths, which are essential for identifying evolutionary patterns. This paper introduces k-Nearest Common Leaves (k-NCL), an algorithm for completing rooted phylogenetic trees defined on different but overlapping taxa. The method uses branch lengths and topological characteristics and does not rely on a specific distance measure. The k-NCL algorithm is designed to preserve evolutionary relationships in the trees under comparison. The running time is O(n2), where n is the size of the union of the two leaf sets. Additional properties include preservation of original distances and topology, symmetry, and uniqueness of the completion. Implemented in Python, k-NCL is evaluated on biological datasets of amphibians, birds, mammals, and sharks. Experimental results show that RF combined with k-NCL improves phylogenetic tree clustering performance compared to the RF(+) tree completion approach. Availability and implementationAn open-source implementation of k-NCL in Python and the datasets used in this study are available at https://github.com/tahiri-lab/KNCL.